Mikhail Belkin

Key Scientific Papers and Results [Google Scholar, list of papers]

Toward universal steering and monitoring of AI models [Science, arxiv]
Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adsera, Mikhail Belkin
Science, 2026.

Modern AI models contain much of human knowledge, yet understanding of their internal representation of this knowledge remains elusive. Characterizing the structure and properties of this representation will lead to improvements in model capabilities and development of effective safeguards. Building on recent advances in feature learning, we develop an effective, scalable approach for extracting linear representations of general concepts in large-scale AI models (language models, vision-language models, and reasoning models). We show how these representations enable model steering, through which we expose vulnerabilities, mitigate misaligned behaviors, and improve model capabilities. Additionally, we demonstrate that concept representations are remarkably transferable across human languages and combinable to enable multi-concept steering. Through quantitative analysis across hundreds of concepts, we find that newer, larger models are more steerable and steering can improve model capabilities beyond standard prompting. We show how concept representations are effective for monitoring misaligned content (hallucinations, toxic content). We demonstrate that predictive models built using concept representations are more accurate for monitoring misaligned content than using models that judge outputs directly. Together, our results illustrate the power of using internal representations to map the knowledge in AI models, advance AI safety, and improve model capabilities.
Mechanism for feature learning in neural networks and backpropagation-free machine learning models [Science, arxiv]
Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit, Mikhail Belkin
Science, 2024.

Understanding how neural networks learn features, or relevant patterns in data, for prediction is necessary for their reliable use in technological and scientific applications. In this work, we presented a unifying mathematical mechanism, known as Average Gradient Outer Product (AGOP), that characterized feature learning in neural networks. We provided empirical evidence that AGOP captured features learned by various neural network architectures, including transformer-based language models, convolutional networks, multi-layer perceptrons, and recurrent neural networks. Moreover, we demonstrated that AGOP, which is backpropagation-free, enabled feature learning in machine learning models, such as kernel machines, that apriori could not identify task-specific features. Overall, we established a fundamental mechanism that captured feature learning in neural networks and enabled feature learning in general machine learning models.
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks [ACHA, arxiv]
Chaoyue Liu, Libin Zhu, Mikhail Belkin
Applied and Computational Harmonic Analysis, 59, 2022.

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. The purpose of this work is to propose a modern view and a general mathematical framework for loss landscapes and efficient optimization in over-parameterized machine learning models and systems of non-linear equations, a setting that includes over-parameterized deep neural networks. Our starting observation is that optimization problems corresponding to such systems are generally not convex, even locally. We argue that instead they satisfy PL*, a variant of the Polyak-Lojasiewicz condition on most (but not all) of the parameter space, which guarantees both the existence of solutions and efficient optimization by (stochastic) gradient descent (SGD/GD). The PL* condition of these systems is closely related to the condition number of the tangent kernel associated to a non-linear system showing how a PL*-based non-linear theory parallels classical analyses of over-parameterized linear equations. We show that wide neural networks satisfy the PL* condition, which explains the (S)GD convergence to a global minimum. Finally we propose a relaxation of the PL* condition applicable to “almost” over-parameterized systems.
Reconciling modern machine learning practice and the bias-variance trade-off [PNAS, arxiv]
Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal
PNAS, 116(32), 2019.

The question of generalization in machine learning—how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample—is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new “double descent” risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample—a point that we call the “interpolation threshold”—the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.
Semi-supervised Learning on Riemannian Manifolds [pdf]
Mikhail Belkin, Partha Niyogi
Machine Learning, 56 (Special Issue on Clustering):209–239, 2004.

We study how to use labeled and unlabeled data together for classification under a manifold assumption, providing a regularization framework based on geometric structure of the data.
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation [Neural Computation, pdf]
Mikhail Belkin, Partha Niyogi
Neural Computation, 15(6):1373–1396, 2003.

One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a low-dimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the high-dimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and a proof of correctness in certain cases are discussed.

Review Article

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation [Acta Numerica, arxiv]
Mikhail Belkin
Acta Numerica, 30, 203–248, 2021.

In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation, and its sibling, over-parameterization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parameterization enables interpolation and provides flexibility to select a right interpolating model. As we will see, just as a physical prism separates colors mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern Machine Learning.

Implications and thoughts on AI

Short pieces for broader audiences.

Does AI already have human-level intelligence? The evidence is clear (Nature, 2026): By any reasonable criteria, the vision of human-level machine intelligence laid out by Alan Turing in 1950 is now a reality.
The necessity of machine learning theory in mitigating AI risk (ACM/IMS Journal of Data Science, 2024; blog version, Jul 2023): Thoughts on the necessity, urgency and possibility of deep learning theory for mitigating AI risk.

From the blog Data, Machine Learning and AI

Lena, or the road not taken (Feb 2026): on how machine learning, rather than brain uploads, became the realized path to AI.
Some thoughts on teaching machine learning post-ChatGPT (Nov 2025): on the implications of ChatGPT for teaching machine learning.
TuringFish (Nov 2024): on the meaning of the “TuringFish” illustration (a zebrafish, stripes, and the limits of intelligence).
Copernicus, Darwin and chatGPT (Sep 2023): on the implications of the success of LLMs and their structure for statistics, science and the human condition.

Mikhail (Misha) Belkin

Key Scientific Papers and Results [Google Scholar, list of papers]

Review Article

Implications and thoughts on AI